Wiki Vandalysis - Wikipedia Vandalism Analysis
نویسندگان
چکیده
Wikipedia describes itself as the “free encyclopedia that anyone can edit”. Along with the helpful volunteers who contribute by improving the articles, a great number of malicious users abuse the open nature of Wikipedia by vandalizing articles. Deterring and reverting vandalism has become one of the major challenges of Wikipedia as its size grows. Wikipedia editors fight vandalism both manually and with automated bots that use regular expressions and other simple rules to recognize malicious edits[5]. Researchers have also proposed Machine Learning algorithms for vandalism detection[19,15], but these algorithms are still in their infancy and have much room for improvement. This paper presents an approach to fighting vandalism by extracting various features from the edits for machine learning classification. Our classifier uses information about the editor, the sentiment of the edit, the “quality” of the edit (i.e. spelling errors), and targeted regular expressions to capture patterns common in blatant vandalism, such as insertion of obscene words or multiple exclamations. We have successfully been able to achieve an area under the ROC curve (AUC) of 0.91 on a training set of 15000 human annotated edits and 0.887 on a random sample of 17472 edits from 317443.
منابع مشابه
Wiki Vandalysis - Wikipedia Vandalism Analysis - Lab Report for PAN at CLEF 2010
Wikipedia describes itself as the “free encyclopedia that anyone can edit”. Along with the helpful volunteers who contribute by improving the articles, a great number of malicious users abuse the open nature of Wikipedia by vandalizing articles. Deterring and reverting vandalism has become one of the major challenges of Wikipedia as its size grows. Wikipedia editors fight vandalism both manuall...
متن کاملAutomatic Vandalism Detection in Wikipedia
We present results of a new approach to detect destructive article revisions, so-called vandalism, in Wikipedia. Vandalism detection is a one-class classification problem, where vandalism edits are the target to be identified among all revisions. Interestingly, vandalism detection has not been addressed in the Information Retrieval literature by now. In this paper we discuss the characteristics...
متن کاملBuilding Automated Vandalism Detection Tools for Wikidata
Wikidata, like Wikipedia, is a knowledge base that anyone can edit. This open collaboration model is powerful in that it reduces barriers to participation and allows a large number of people to contribute. However, it exposes the knowledge base to the risk of vandalism and low-quality contributions. In this work, we build on past work detecting vandalism in Wikipedia to detect vandalism in Wiki...
متن کاملAn Empirical Research: "Wikipedia Vandalism Detection using VandalSense 2.0" - Notebook for PAN at CLEF 2011
Wikipedia despite having a very small budget has been among the top ten most visited websites for over half a decade. Being this visible also generated the problem of ill intended people modifying Wikipedia in a destructive manner. VandalSense is an experimental tool programmed by F. Gediz Aksit to automatically identify vandalism on Wikipedia through the use of machine learning and text mining...
متن کاملMaking your database available through Wikipedia: the pros and cons
Wikipedia, the online encyclopedia, is the most famous wiki in use today. It contains over 3.7 million pages of content; with many pages written on scientific subject matters that include peer-reviewed citations, yet are written in an accessible manner and generally reflect the consensus opinion of the community. In this, the 19th Annual Database Issue of Nucleic Acids Research, there are 11 ar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010